Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text

Identifieur interne : 000922 ( Main/Exploration ); précédent : 000921; suivant : 000923

Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text

Auteurs : Tarek Elghazaly [Égypte] ; Aly Fahmy [Égypte]

Source :

RBID : ISTEX:B57F71A720C3286257DF773DF9FD0D9AA1EB23F2

Abstract

Abstract: This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.

Url:
DOI: 10.1007/978-3-642-00382-0_39


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text</title>
<author>
<name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
</author>
<author>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B57F71A720C3286257DF773DF9FD0D9AA1EB23F2</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-00382-0_39</idno>
<idno type="url">https://api.istex.fr/document/B57F71A720C3286257DF773DF9FD0D9AA1EB23F2/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000024</idno>
<idno type="wicri:Area/Istex/Curation">000024</idno>
<idno type="wicri:Area/Istex/Checkpoint">000444</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Elghazaly T:query:translation:and</idno>
<idno type="wicri:Area/Main/Merge">000930</idno>
<idno type="wicri:Area/Main/Curation">000922</idno>
<idno type="wicri:Area/Main/Exploration">000922</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text</title>
<author>
<name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Cairo University, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Égypte</country>
</affiliation>
</author>
<author>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Cairo University, Giza</wicri:regionArea>
<wicri:noRegion>Giza</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Égypte</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">B57F71A720C3286257DF773DF9FD0D9AA1EB23F2</idno>
<idno type="DOI">10.1007/978-3-642-00382-0_39</idno>
<idno type="ChapterID">39</idno>
<idno type="ChapterID">Chap39</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCR-Errors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Égypte</li>
</country>
</list>
<tree>
<country name="Égypte">
<noRegion>
<name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
</noRegion>
<name sortKey="Elghazaly, Tarek" sort="Elghazaly, Tarek" uniqKey="Elghazaly T" first="Tarek" last="Elghazaly">Tarek Elghazaly</name>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
<name sortKey="Fahmy, Aly" sort="Fahmy, Aly" uniqKey="Fahmy A" first="Aly" last="Fahmy">Aly Fahmy</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000922 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000922 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:B57F71A720C3286257DF773DF9FD0D9AA1EB23F2
   |texte=   Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024